Search for: All records

Creators/Authors contains: "Puiu, Daniela"

« Prev Next »

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A genome sequence for the threatened whitebark pine

https://doi.org/10.1093/g3journal/jkae061

Neale, David B.; Zimin, Aleksey V.; Meltzer, Amy; Bhattarai, Akriti; Amee, Maurice; Figueroa Corona, Laura; Allen, Brian J.; Puiu, Daniela; Wright, Jessica; De La Torre, Amanda R.; et al (March 2024, G3: Genes, Genomes, Genetics)

Abstract Whitebark pine (WBP, Pinus albicaulis) is a white pine of subalpine regions in the Western contiguous United States and Canada. WBP has become critically threatened throughout a significant part of its natural range due to mortality from the introduced fungal pathogen white pine blister rust (WPBR, Cronartium ribicola) and additional threats from mountain pine beetle (Dendroctonus ponderosae), wildfire, and maladaptation due to changing climate. Vast acreages of WBP have suffered nearly complete mortality. Genomic technologies can contribute to a faster, more cost-effective approach to the traditional practices of identifying disease-resistant, climate-adapted seed sources for restoration. With deep-coverage Illumina short reads of haploid megagametophyte tissue and Oxford Nanopore long reads of diploid needle tissue, followed by a hybrid, multistep assembly approach, we produced a final assembly containing 27.6 Gb of sequence in 92,740 contigs (N50 537,007 bp) and 34,716 scaffolds (N50 2.0 Gb). Approximately 87.2% (24.0 Gb) of total sequence was placed on the 12 WBP chromosomes. Annotation yielded 25,362 protein-coding genes, and over 77% of the genome was characterized as repeats. WBP has demonstrated the greatest variation in resistance to WPBR among the North American white pines. Candidate genes for quantitative resistance include disease resistance genes known as nucleotide-binding leucine-rich repeat receptors (NLRs). A combination of protein domain alignments and direct genome scanning was employed to fully describe the 3 subclasses of NLRs. Our high-quality reference sequence and annotation provide a marked improvement in NLR identification compared to previous assessments that leveraged de novo-assembled transcriptomes.
more » « less
Chromosome-Scale Assembly of the Bread Wheat Genome Reveals Thousands of Additional Gene Copies

https://doi.org/10.1534/genetics.120.303501

Alonge, Michael; Shumate, Alaina; Puiu, Daniela; Zimin, Aleksey; Salzberg, Steven L. (October 2020, Genetics)

Bread wheat (Triticum aestivum) is a major food crop and an important plant system for agricultural genetics research. However, due to the complexity and size of its allohexaploid genome, genomic resources are limited compared to other major crops. The IWGSC recently published a reference genome and associated annotation (IWGSC CS v1.0, Chinese Spring) that has been widely adopted and utilized by the wheat community. Although this reference assembly represents all three wheat subgenomes at chromosome-scale, it was derived from short reads, and thus is missing a substantial portion of the expected 16 Gbp of genomic sequence. We earlier published an independent wheat assembly (Triticum_aestivum_3.1, Chinese Spring) that came much closer in length to the expected genome size, although it was only a contig-level assembly lacking gene annotations. Here, we describe a reference-guided effort to scaffold those contigs into chromosome-length pseudomolecules, add in any missing sequence that was unique to the IWGSC CS v1.0 assembly, and annotate the resulting pseudomolecules with genes. Our updated assembly, Triticum_aestivum_4.0, contains 15.07 Gbp of non-gap sequence anchored to chromosomes, which is 1.2 Gbps more than the previous reference assembly. It includes 108,639 genes unambiguously localized to chromosomes, including over 2,000 genes that were previously unplaced. We also discovered more than 5,700 additional gene copies, facilitating the accurate annotation of functional gene duplications including at the Ppd-B1 photoperiod response locus.
more » « less
Full Text Available
Dissecting the Polygenic Basis of Cold Adaptation Using Genome-Wide Association of Traits and Environmental Data in Douglas-fir

https://doi.org/10.3390/genes12010110

De La Torre, Amanda R.; Wilhite, Benjamin; Puiu, Daniela; St. Clair, John Bradley; Crepeau, Marc W.; Salzberg, Steven L.; Langley, Charles H.; Allen, Brian; Neale, David B. (January 2021, Genes)

Understanding the genomic and environmental basis of cold adaptation is key to understand how plants survive and adapt to different environmental conditions across their natural range. Univariate and multivariate genome-wide association (GWAS) and genotype-environment association (GEA) analyses were used to test associations among genome-wide SNPs obtained from whole-genome resequencing, measures of growth, phenology, emergence, cold hardiness, and range-wide environmental variation in coastal Douglas-fir (Pseudotsuga menziesii). Results suggest a complex genomic architecture of cold adaptation, in which traits are either highly polygenic or controlled by both large and small effect genes. Newly discovered associations for cold adaptation in Douglas-fir included 130 genes involved in many important biological functions such as primary and secondary metabolism, growth and reproductive development, transcription regulation, stress and signaling, and DNA processes. These genes were related to growth, phenology and cold hardiness and strongly depend on variation in environmental variables such degree days below 0c, precipitation, elevation and distance from the coast. This study is a step forward in our understanding of the complex interconnection between environment and genomics and their role in cold-associated trait variation in boreal tree species, providing a baseline for the species’ predictions under climate change.
more » « less
Full Text Available
The genome of the American groundhog, Marmota monax

https://doi.org/10.12688/f1000research.25970.1

Puiu, Daniela; Zimin, Aleksey; Shumate, Alaina; Ge, Yuchen; Qiu, Jiabin; Bhaskaran, Manoj; Salzberg, Steven L. (January 2020, F1000Research)

We sequenced the genome of the North American groundhog, Marmota monax , also known as the woodchuck. Our sequencing strategy included a combination of short, high-quality Illumina reads plus long reads generated by both Pacific Biosciences and Oxford Nanopore instruments. Assembly of the combined data produced a genome of 2.74 Gbp in total length, with an N50 contig size of 1,094,236 bp. To annotate the genome, we mapped the genes from another M. monax genome and from the closely related Alpine marmot, Marmota marmota , onto our assembly, resulting in 20,559 annotated protein-coding genes and 28,135 transcripts. The genome assembly and annotation are available in GenBank under BioProject PRJNA587092 .
more » « less
Full Text Available
Semi-automated assembly of high-quality diploid human reference genomes

https://doi.org/10.1038/s41586-022-05325-5

Jarvis, Erich D.; Formenti, Giulio; Rhie, Arang; Guarracino, Andrea; Yang, Chentao; Wood, Jonathan; Tracey, Alan; Thibaud-Nissen, Francoise; Vollger, Mitchell R.; Porubsky, David; et al (November 2022, Nature)

Abstract The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society 1,2 . However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals 3,4 . Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome 5 . To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity 6 . Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent–child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.
more » « less
Full Text Available
Recombination between heterologous human acrocentric chromosomes

https://doi.org/10.1038/s41586-023-05976-y

Guarracino, Andrea; Buonaiuto, Silvia; de Lima, Leonardo Gomes; Potapova, Tamara; Rhie, Arang; Koren, Sergey; Rubinstein, Boris; Fischer, Christian; Abel, Haley J.; Antonacci-Fulton, Lucinda L.; et al (May 2023, Nature)

Abstract The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share large homologous regions, including ribosomal DNA repeats and extended segmental duplications 1,2 . Although the resolution of these regions in the first complete assembly of a human genome—the Telomere-to-Telomere Consortium’s CHM13 assembly (T2T-CHM13)—provided a model of their homology 3 , it remained unclear whether these patterns were ancestral or maintained by ongoing recombination exchange. Here we show that acrocentric chromosomes contain pseudo-homologous regions (PHRs) indicative of recombination between non-homologous sequences. Utilizing an all-to-all comparison of the human pangenome from the Human Pangenome Reference Consortium 4 (HPRC), we find that contigs from all of the SAACs form a community. A variation graph 5 constructed from centromere-spanning acrocentric contigs indicates the presence of regions in which most contigs appear nearly identical between heterologous acrocentric chromosomes in T2T-CHM13. Except on chromosome 15, we observe faster decay of linkage disequilibrium in the pseudo-homologous regions than in the corresponding short and long arms, indicating higher rates of recombination 6,7 . The pseudo-homologous regions include sequences that have previously been shown to lie at the breakpoint of Robertsonian translocations 8 , and their arrangement is compatible with crossover in inverted duplications on chromosomes 13, 14 and 21. The ubiquity of signals of recombination between heterologous acrocentric chromosomes seen in the HPRC draft pangenome suggests that these shared sequences form the basis for recurrent Robertsonian translocations, providing sequence and population-based confirmation of hypotheses first developed from cytogenetic studies 50 years ago 9 .
more » « less
Full Text Available
A draft human pangenome reference

https://doi.org/10.1038/s41586-023-05896-x

Liao, Wen-Wei; Asri, Mobin; Ebler, Jana; Doerr, Daniel; Haukness, Marina; Hickey, Glenn; Lu, Shuangjia; Lucas, Julian K.; Monlong, Jean; Abel, Haley J.; et al (May 2023, Nature)

Abstract Here the Human Pangenome Reference Consortium presents a first draft of the human pangenome reference. The pangenome contains 47 phased, diploid assemblies from a cohort of genetically diverse individuals 1 . These assemblies cover more than 99% of the expected sequence in each genome and are more than 99% accurate at the structural and base pair levels. Based on alignments of the assemblies, we generate a draft pangenome that captures known variants and haplotypes and reveals new alleles at structurally complex loci. We also add 119 million base pairs of euchromatic polymorphic sequences and 1,115 gene duplications relative to the existing reference GRCh38. Roughly 90 million of the additional base pairs are derived from structural variation. Using our draft pangenome to analyse short-read data reduced small variant discovery errors by 34% and increased the number of structural variants detected per haplotype by 104% compared with GRCh38-based workflows, which enabled the typing of the vast majority of structural variant alleles per sample.
more » « less
Full Text Available
Genome‐wide association identifies candidate genes for drought tolerance in coast redwood and giant sequoia

https://doi.org/10.1111/tpj.15592

De La Torre, Amanda R.; Sekhwal, Manoj K.; Puiu, Daniela; Salzberg, Steven L.; Scott, Alison D.; Allen, Brian; Neale, David B.; Chin, Alana R. O.; Buckley, Thomas N. (December 2021, The Plant Journal)

SUMMARY Drought is a major limitation for survival and growth in plants. With more frequent and severe drought episodes occurring due to climate change, it is imperative to understand the genomic and physiological basis of drought tolerance to be able to predict how species will respond in the future. In this study, univariate and multitrait multivariate genome‐wide association study methods were used to identify candidate genes in two iconic and ecosystem‐dominating species of the western USA, coast redwood and giant sequoia, using 10 drought‐related physiological and anatomical traits and genome‐wide sequence‐capture single nucleotide polymorphisms. Population‐level phenotypic variation was found in carbon isotope discrimination, osmotic pressure at full turgor, xylem hydraulic diameter, and total area of transporting fibers in both species. Our study identified new 78 new marker × trait associations in coast redwood and six in giant sequoia, with genes involved in a range of metabolic, stress, and signaling pathways, among other functions. This study contributes to a better understanding of the genomic basis of drought tolerance in long‐generation conifers and helps guide current and future conservation efforts in the species.
more » « less
Genomic basis of white pine blister rust quantitative disease resistance and its relationship with qualitative resistance

https://doi.org/10.1111/tpj.14928

Weiss, Matthew; Sniezko, Richard A.; Puiu, Daniela; Crepeau, Marc W.; Stevens, Kristian; Salzberg, Steven L.; Langley, Charles H.; Neale, David B.; De La Torre, Amanda R. (October 2020, The Plant Journal)

Full Text Available
A Reference Genome Sequence for Giant Sequoia

https://doi.org/10.1534/g3.120.401612

Scott, Alison D; Zimin, Aleksey V; Puiu, Daniela; Workman, Rachael; Britton, Monica; Zaman, Sumaira; Caballero, Madison; Read, Andrew C; Bogdanove, Adam J; Burns, Emily; et al (November 2020, G3 Genes|Genomes|Genetics)

Abstract The giant sequoia (Sequoiadendron giganteum) of California are massive, long-lived trees that grow along the U.S. Sierra Nevada mountains. Genomic data are limited in giant sequoia and producing a reference genome sequence has been an important goal to allow marker development for restoration and management. Using deep-coverage Illumina and Oxford Nanopore sequencing, combined with Dovetail chromosome conformation capture libraries, the genome was assembled into eleven chromosome-scale scaffolds containing 8.125 Gbp of sequence. Iso-Seq transcripts, assembled from three distinct tissues, was used as evidence to annotate a total of 41,632 protein-coding genes. The genome was found to contain, distributed unevenly across all 11 chromosomes and in 63 orthogroups, over 900 complete or partial predicted NLR genes, of which 375 are supported by annotation derived from protein evidence and gene modeling. This giant sequoia reference genome sequence represents the first genome sequenced in the Cupressaceae family, and lays a foundation for using genomic tools to aid in giant sequoia conservation and management.
more » « less
Full Text Available

« Prev Next »